ON A ROLE OF PREDICTOR IN THE FILTERING STABILITY 
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Abstract. When is a nonlinear filter stable with respect to its initial condition? In spite 
of the recent progress, this question still lacks a complete answer in general. Currently 
available results indicate that stability of the filter depends on the signal ergodic prop- 
erties and the observation process regularity and may fail if either of the ingredients is 
ignored. In this note we address the question of stability in a particular weak sense and 
show that the estimates of certain functions are always stable. This is verified without 
dealing directly with the filtering equation and turns to be inherited from certain one-step 
predictor estimates. 



1. Introduction 

Consider the filtering problem for a Markov chain (X, Y) = (X n , Y n ) n ^z + with the signal 
X and observation Y. The signal process A" is a Markov chain itself with the transition 
kernel A(u, dx) and initial distribution v. The observation process Y has the transition 
probability law 

P(Y n £ B\X n -i,Y n -i) = [ 7 (X n _i, 2 /)^(dy) J Be^(R), 

Jb 

where 7(14, y) is a density with respect to a c-finite measure 99 on R. We set Yq = 0, so that, 
a priori information on the signal state at time n = is confined to the signal distribution 
v. The random process (X, Y) is assumed to be defined on a complete probability space 
(f2, J£", P). Let {^n)n>0 be the filtration generated by Y: 

^ = {0,fl}, &l = a{Y 1 ,...,Y n }. 

It is well known that the regular conditional distribution dP(X n < x\^^) =: ^nidx) solves 
the recursive Bayes formula, called the nonlinear filter. 

7Tn(dx) = = — — — — , n>l, (1.1) 

J R j{v,Yn)Tr n -i{dv) 

subject to TTo(dx) = v{dx). Clearly 

Kn(f) ■= / f(x)ir n (dx) 



is a version of the conditional expectation E(/(X n )|^^) for any measurable function / 
f{x), with £\fiX n )\ < 00. 
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Assume v is unknown and the filter p. 1)1 is initialized with a probability distribution 
v, different from v and denote the corresponding solution by 7f = (7f n ) n >o- Obviously, an 
arbitrary choice of v may not be admissible: it makes sense to choose v such that Tt n {dx) 
preserves the properties of a probability distribution, i.e. j B 7t n (dx) > for any measurable 
set B £ M and J K 7f n (dx) = 1 for each n > 1 P-a.s. This would be the case if the right hand 
side of p. 1)1 does not lead to 0/0 uncertainty with a positive probability. As explained in 
the next section, the latter is provided by the relation v -C u, which is assumed to be in 
force hereafter. In fact it plays an essential role in the proof of main result. 

The sequence n = (7f„) n >o of random measures generally differs from -it = (7r n )„>o and 
the estimate 7f n (/) of a particular function / is said to be stable if 

E|vr n ,(/)-7f n (/)| ,0 (1.2) 

holds for any admissible pair {y,v\ 

The verification of p. 2)1 in terms of A(u, dx), 7(2;, y), (p(dy) is quite a nontrivial problem, 
which is far from being completely understood in spite of the extensive research during the 
last decade. 

For a bounded /, p. 2)1 is closely related to ergodicity of tt = (7r„) n >o, viewed as a Markov 
process on the space of probability measures. In the late 50's D. Blackwell, motivated by 
the information theory problems, conjectured in jS] that tt has a unique invariant measure 
in the particular case of ergodic Markov chain X with a finite state space and noiseless 
observations Y n = h(X n ), where h is a fixed function. This conjecture was found to be false 
by T. Kaijser, |15j . In the continuous time setting, H. Kunita addressed the same question 
in ^fl| for a filtering model with general Feller-Markov process X and observations 

Y t = [ h(X s )ds + W t , (1.3) 
J 

where the Wiener process W = (Wt)t>o is independent of X. According to |16) . the filtering 
process tt = (irt)t>o inherits ergodic properties from X, if the tail cr-algebra of X is P-a.s. 
empty. Unfortunately this assertion remains questionable due to a gap in its proof (see [1] ) . 

Notice that p. 2)1 for bounded / also follows from 

IKn-TTnlL ► 0, P - O.S., (1.4) 

where || • || tv is the total variation norm. Typically this stronger type of stability holds 
when X is an ergodic Markov chain with the state space S C R (or R d , d > 1) and its 
transition probability kernel A(u, dx) is absolutely continuous with respect to a cr-finite 
reference measure ip(dx), 

A(u,dx) = \(u,x)ip(dx), 
while the density A satisfies the so called mixing condition: 

< A* < A(it, x) < A*, Vi,« (1.5) 

with a pair of positive constants A* and A*. Then (see 0, JS]; [TT| - 0), 

lim ilog||7r n - 7fJ| < P-a.s. (1.6) 

n— >oo n " " tv A* 
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The condition (jl.5|) was recently relaxed in jSj, where (jl.6|) was verified with A* replaced 
by 

A := / essinf X(u, x)/j,(u)ip(du), 

with fi(u) being the invariant density of the signal relative to ip(du). 

The mixing condition, including its weaker form, implies geometric ergodicity of the 
signal (see [S])- However, in general the ergodicity (and even geometrical ergodicity) itself 
does not imply stability of the filter (see counterexamples in JO], [I]). If the signal 
process X is compactly supported, the density X(u,x) usually corresponds to the Lebesgue 
measure or purely atomic reference measure ip(dx). Signals with non compact state space 
do not fit the mixing condition framework since an appropriate reference measure is hard 
to find and sometimes it doesn't exist (as for the Kalman-Bucy filter). 

In non-compact or non-ergodic settings, the filtering stability can be verified under addi- 
tional structural assumptions on (X, Y). In this connection, we mention the Kalman-Bucy 
filter being stable for controllable and observable linear systems (see e.g. |10| . |20| . |19j . 
Sections 14.6 and 16.2). Similarly, in the nonlinear case certain relations between X(x,u) 
and j(x,y) provide (fl"4"|) (see e.g. 0, 0, P, OH, 01). 

In summary, stability of the nonlinear filter stems from a delicate interplay of the signal 
ergodic properties and the observations "quality". If one of these ingredients is removed, 
the other should be strengthened in order to keep the filter stable. Notably all the available 
results verify (jl.2() via Q1-4JI and, thus, require restricting assumptions on the signal struc- 
ture. Naturally, this raises the following question: are there functions / for which (|1.2[) 
holds with "minimal" constraints on the signal model ? 

In this note, we give examples of functions for which this question has an affirmative 
answer. It turns out that ()1.2|) holds \iv and the integral equation with respect to g, 

f( x )= / g(y)l(x,y)(p(dy), (1.7) 

has a bounded solution. The proof of this fact relies on the martingale convergence theorem 
rather than direct analysis of filtering equation IJl.ljl . 

The precise formulations and other generalizations with their proofs are given in Section 
El Several nonstandard examples are discussed in Section [3 



2. Preliminaries and the main result 

For notational convenience, we assume that the pair (X, Y) is a coordinate process defined 
on the canonical measurable space (fi, J 5 ") with O = (R°° x R°°) and & = ^(R°° x R°°), 
where S3 stands for the Borel <r-algebra. Let P be a probability measure on (Q, ^) such that 
(X, Y) is Markov a process with the transition kernel , y(u,y)A(u,dx)(p(dy) and the initial 
distribution v(dx)5^(dy), where S^(dy) is the point measure at zero. Let P be another 
probability measure on (0, J^") such that (X, Y) is Markov process with the same transition 
law and the initial distribution v{dx)5^{dy). Hereafter, E and E denote expectations 
relative to P and P respectively. By the Markov property of (X, Y), 

— dP , du — 

v <C v => P <C P and —=(x,y) = —(xq), P-a.s. 
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We assume that 0^ is completed with respect to P. Denote = V n >o an d let 
P y , P Y and P Y , P Y be the restrictions of P, P on and & Y respectively. Obviously, 
P < P P y < P y and P Y < P y with the densities 

^ *(&X>)\*Z) and S = Eft*o)|^W. 



dP^ V<^ v u " °°/ dP Y \du 

Let Tt n (dx) be the solution of 1)1. ljl subject to v considered on (0,J£",P), so that, it is a 
version of the conditional distribution P(X n < x\^ Y ). Since P <C P, vf n satisfies (|l.lj> on 
(O, ^, P) as well. 

In the sequel, we have to operate with g n 7r n (dx) as a random object defined on (f2, J^, P). 
Since P <C ^ is not assumed, 7r n cannot be defined properly on (fi, J^, P) by applying the 
previous arguments. However, the product g n 7r n is well defined on (fi, P). Indeed, let 
-F denote the set, where g-a^n is 

well defined. Notice that fe^J and so, P(r) = P n (-T). 
Now, by the Lebesgue decomposition of P n with respect to P n , 

P n (r) = f g-^dPn + P n ({ Qn = 0} n r) > / ^^p„. 

irn{ en >o} irn{ e „>o} 

Since both 7r n and g n are defined P-a.s., P n (-0 = 1 holds. Moreover, P n (g n > 0) = 1 since 
P n (g n = 0) = /{ en=0} QndPn = 0. Hence, 



Q n 1 dP n = / £ ) n 1 (iP n = / Q^QndPn = 1, 



'rn{ e „>o} 
that is, Pn(-T) = 1. 

For jiRwIR with E|g(Y n )| < oo and E|g(Y n )| < oo, let us define predicting estimates: 

Vn\n-i(g) = E{9( Y n)\^n-l) and Vn\n-i{g) = ^(g{ Y n)\^n-l) ■ We nx the following versions 
of these conditional expectations 



rin\n-i{g)= / / g(yh{x,y)if{dy)7r n _ l (dx) 

Vn\ n -i(g) = / / g(yh(x,y)<p(dy)Tt n -i(dx). 



Similarly to 7r n , the predictor r) n \ n _i(g) is well defined P- and P-a.s. while only g n -iVn\n-i(9) 
makes sense with respect to both measures. 

Theorem 2.1. Assume v <C v and any of the following conditions: 

(i) g is bounded; 

(ii) — is bounded and the family (g(Y n )) n >i is P -uniformly integrable; 
dv ~ 

1 1 ^(dv\v 



(iii) for p,q > 1, — | — = 1, E^— J < oo and the family (\g(Y n )\ q ) n>1 is P-uniformly 
integrable. 

Then, 

lim E\Vn\n-l(g) ~ f}n\n-l{g)\ = 0- (2.1) 
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Proof. Suppose that a is J^-measurable random variable denned on (f2, J£", P) with 
E|a| < oo. Then E|a|£> n , < oo and 

Q n -iE(a\&Z_ 1 )=E(atQ n \&Z_ 1 ), P-a.s. (2.2) 
(i) For a := g(Y n ), reads: 

Qn-ir]n\n-l{g) = E(g{ Y n)0n\^n-l) , P-&.S. 
Therefore, (here \g\ < C is assumed for definiteness) 

-dP Y _ 

E|r/ n ,| n _i(g) -r? re |n-i(5)| = ^ d pY Iv^n-iis) ~ *7n|Ti-l(flf)| 

= EQn-l\r)n\n-l(9) ~ Vn\n-l(g)\ 

= E\Qn-lVn\n-l(g) ~ Qn-\Vn\n-l{g) | ^'^ 

= E|E( 5 (y n ,Kl^li) " E{g(Y n ) en ^\K-i)\ 

= E E(g(Y n )( 6n - Q n -l)\^n-l) < CE\Q n - Q n - X \. 



Since (feJ'J, P)n>l is a uniformly integrable martingale converging to ^ = E^— , 

and E| £> n — £> n _i | < E| g n — g^ | + E| g n -\ — Qoo \ , the required result follows from lhm^oo E| g n - 
Qoo\ = by the Scheffe theorem. 

(ii) Set g c = gl{\ g \<c}, th en by ©, 

lim E\ Vnln ^(g c ) - f}n\n-l(9 C )\ = 0, V C > 0. 
and it is left to show that 

c 



lim Mm. E\ri n \ n _i{g - g ) =0 
lim lim E^^^g - g c )\ = 0. 



(2.4) 



Let for definiteness — < K and thus g n < K, P-a.s. for all n > 1. Then 

EKi^fe - s c )| < E| 5 (y n )|/ { | 9(y?i) | >c} < ^E| 5 (y n )|/{|<7(y n )| > c} 

E|r?„|n-i(5 - 9 C )\ = EQn-! \fjn\n-l(g ~ g°)\ < KE\g(Y n )\I{\g(Y n )\ > C}, 

and Q2.4|) holds by the uniform integrability assumption from (jnj). 

(iii) By (j2,3j) . it suffices to show that lim n _ +00 E|p(y n )||^ n — g n -i\ = 0. By the Holder 
inequality we have 

E\g(Y n )\\g n -g n ^\ < (E\g(Y n )\«) ^ (E|^ n - g n _^) ^ . 

The P-uniform integrability of (\g(Y n )\ q ) n >o provides sup ra>0 E\g(Y n )\ q < oo. Since 

lim E\g n - g n _i\ = 
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it is left to check that the family {\g n + Q n -i\ p )n>i is P-uniformly integrable. This holds by 
the following upper bound 

E\Qn + Qn-i\ p < 2 p ~ 1 (E^ + E^_ x ) < 2 p E(^) P , p > 1 
where the Jensen inequality has been used. ■ 

Corollary 2.2. Let f be a measurable function and assume that there is a function g solving 
Q1.7JI and satisfying the assumptions of Theorem \2.l\ Then 

lim E|7r n (/)-7f n (/)| = 0. 

n— >oo 

Proof. Since 7r n _i(/) = Vn\n-i(g) and vf n -i(/) = Vn\n-i(]9), tne claim is nothing but 
(f2~Tj) . ■ 

3. Examples 

3.1. Hidden Markov Chains. Let X be a Markov chain taking values in a finite alphabet 
S = {oi, ...,a<i} and the observation 

d 
3=1 

where £ n (j)i 3 = 1, ■ ■ ■ ,d, are independent entries of the random vectors £ n , which form an 
i.i.d. sequence independent of X. 

This variant of Hidden Markov Model is popular in various applications (see e.g. ^2j) and 
its stability analysis has been carried out by several authors (see e.g. jH], |18j . [1]) mainly 
for ergodic chain X. The nonlinear filter (jl.lj) is finite dimensional, namely, the conditional 
distribution ir n (dx) is just the vector of conditional probabilities 7r n (z) = P(X n = a^jF^ ), 
i = 1, d and 

d 

IKn - 7f„||tv = ^ KnOO - 7f n (*)l- 
i=l 

The following holds regardless of the ergodic properties of X: 

Proposition 3.1. Assume 

(al) all atoms of v are positive 
(a2) E|£i(j)|' < oo, i,j = l,...,d 

(a3) the d x d matrix B with the entries B-ij = E(£i(j)) 4 is nonsingular 
Then, 

lim E||"7r n — 7f n |L. = 0. 



n— >oo 



tv 



Proof. The condition (jn| of Theorem 12.11 is satisfied for any gi(y) = y l , i = l,...,d. 
Indeed, (jaT|) and (|aS|) imply — < const, and the uniform integrability of gi(Y n ) for any i 



V 



since E|^(y n )| < Ylj=i E \CiU)\ % < °o. Finally, 

d d 
Vn\n-l(9i) = E((y n ) l |^ n i l 1 ) = XVn-iO'JEtelO'))* = ^^-iW^. 

3=1 3=1 

and, then, by Theorem 12.11 

d 

E\Vn\n-l(9i) ~ Vn\n-l(9i)\ = E V] Un-lO") - 7T n _l (j)) S„ > 0. 

11 1 1 | — ' n— >oo 

3=1 

The latter and the nonsingularity of B proves the claim. ■ 

3.2. Observations with multiplicative white noise. This example is borrowed from 
|13| . The signal process is defined by the linear recursive equation 

X n = aX n _i + 6 n , 

where \a\ < 1 and (9 n ) n >i is (0, 6 2 )-Gaussian white noise independent of X , that is, the 
signal process is ergodic. The distribution function v has density q(x) relative to dx from 
the Serial Gaussian (SG) family: 

where a is the scaling parameter, a^s are nonnegative weight coefficients, Xli>o Qi = ^ an( ^ 
Cn are the normalizing constants. The observation sequence is given by 

where £ n is a sequence of i.i.d. random variables. The distribution function of £i is assumed 
to have the following density relative to dx: 

p(x) = ^exp(-4) , P(0) = 0, (3.1) 
\x\° V x / 

where p is a positive constant. This filtering model is motivated by financial applications 
when \X\ is interpreted as the stochastic volatility parameter of an asset price. 

As proved in ^3], the filter (|l.lj) admits a finite dimensional realization provided that 
atj = 0, j > N for some integer N > 1, namely for any time n > 1 the filtering distribution 
7r n (dx) has a density of SG type with the scaling parameter a n and the weights aj n , which are 
propagated by a finite (growing with n) set of recursive equations driven by the observations. 
Thus, the evolution of n n {dx) is completely determined via a n and oti n . Some stability 
analysis for the sequence (a n , (ctin)i>i) >1 has been done in [Tl| . 

Assume that the density q(x) of v belongs to the SG family, but its parameters are 
unknown. If the filter is started from the Gaussian density with zero mean and variance 
a 2 , the filtering equation remains finite dimensional and the density 



dv q(x) ( v-^ x 2% \ a ( x 2 ( 1 

= ^73 = ( >.«i3i7r- )- ex P 

is bounded, if <r > er 



cfc/ v 7 g(z) a 2i C 2i Ja ^\ 2 W 2 a 2 
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In terms of the setting under consideration j(x,y) = - — rpiy/x), where p(-) is defined in 

\x\ 

(|3.1j) and (f(dy) = dy. For f(x) := |x| 

Vn\n-l(f) = E(/(y n )|^ n y _ 1 ) = 7r n _!(/)E|a|, 

where E|£i| > and hence g{y) = |y|/E|£i| solves (|1.7(l . Finally, (g(Y n )) >1 is P-uniformly 
integrable family since 

b|A n _i^ n |i { | Xn _ l5n | >c} < - ^ , 

where E|£i| 1+e < oo if e G [0,1) and sup n>0 E|X n _i| 1+e < oo is implied by |a| < 1 and 
J R \x\v(dx) < oo. Thus, by Corollary 12.21 



lim E 

n— >oo 



\x\7T n (dx) — / \x\-ir n (dx) 



0. 



3.3. Additive observation noise. Suppose 

Y n = h{x n - x ) + £ n , (3.2) 

where h is a fixed measurable function, £ = (£ n ) n >i is an i.i.d. sequence of random variables 
independent of X. Since 

E(g(Y n )\^_ 1 )=n n - 1 (h) + E£ 1 

and if one of the integrability conditions in Theorem 12.11 is satisfied for g(y) := y, (|1.2j) 
holds true for h: 

lim E\n n (h) - ft n (h)\ = 0. (3.3) 

n— >oo 1 1 

Remark 3.2. (|3.3() resembles the result of J.M.C. Clark et al [5] in the continuous time 
setting: for a general Markov signal X = (Xt)t>o and the observations Y = (Yt)t>o of the 
form (|1.3|) . 

E / U t (h) -n t {h)) 2 dt < 2 / log^(x)u(dx). 
Jo Jr dv 

This bound is verified information theoretical arguments. 

3.3.1. Linear observations h(x) = x. Consider the linear observation model ()3.2j) with 

h(x) = x: 

Y n — X n —\ -\- 

Proposition 3.3. Assume 
, . n dv 

(Al) — < c < oo 
dv 

(A2) Xn is P-uniformly integrable for some p > 1 
(A3) | Ee^ 1 *! > for allteR 

Then for any continuous function f(x),x 6 R, growing not faster than a polynomial of 
order p, 

lim E|7r„(/) =0. 



Proof. If / is an unbounded function, it can be approximated by a sequence of bounded 

x, \x\ < i 



? sign(2;), \x\ > I. 



functions fi,l>\ with f e (x) = g e (f(x)), where g e (x) 
Further, for k = 1, 2, . . ., set 

ft{x), \x\ < k- 1 
ft,k( x ) = { h,k{ x )i k — 1 <\x\<k 
0, |x| > k, 

where fe,k(x) is chosen so that the function fe t k(x) is continuous and 

\ft,k( x )\ < \fi(x)\- 

By the second Weierstrass approximating theorem (see e.g. one can choose a trigono- 
metrical polynomial P m ,£,k(x) such that for any positive number m, 

max \fe,k(x) ~ Pm,£,k( x )\ < — ■ 
xe[-k,k] m 

Since P m ,£,k(x) is a periodic function, 

\p m ,e,k( x )\ < — + max \fe ,k(y)\ - — + ^ for an y M > k - 

m \ y \<k m 
Using the triangular inequality for 

/ = P m ,e,k + {fe,k — Pm,e,k) + (ft — ft,k) + (/ — ft), 

and the estimates 

\fi,k - Pm,£,k\ < -kl{\x\<k} + + fy{\x\>k} 
\fi - ft,k\ < ^{|x|>fc} 

1/ — ft\ < C(l + M p )^{c(i+|x|j>)>.«}) f° r some constant C > 
we find the following upper bound 

I/ — Pm,t,k\ < — ^{|x|<fc} + ( 1" 2 ^V{|iE|>fc} + C(l + M P K{c*(i+|z| p )>^}' 

implying 



Ek n (/) - 7r n (/)| < E|7r n (P m v jfe) - 7r n (P m v fe) I + — + HE [ [n n (dx) + 7T n {dx)] 

m J{\x\>k} 

+ CE ( (1 + \x\ p )ir n (dx) + CE f (1 + |af 



' {C(l+\x\P)>£} J{C{l+\x\v)>£} 

Thus, the desired result holds by arbitrariness of m provided that 
((1)) lim E\n n {P m ^ k ) -Tt n (P m i k )\ = 0, V m,£,k; 

n—>oo 

((2)) lim lim" 2£E fn xi>k y[iT n (dx) + n n (dx)\ = 0, V 

iSSjrSSo E ^{C(l+|x|P)>£}( 1 + \x\ P )[^n{dx) + TT n (dx)] = 0; 
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7T]| holds due to E(e** Y «| J^) = E(e iX »- 1 *|^ r _ 1 )Ee < ^ 1 = 7r n _i(e ite )Ee^ 1 and the as- 



sumption (|A3|) since, by Theorem 12. 11 

lim E|7r n (e ite ) - 7t n (e itx ) 1=0, Vi G R. 

n— >oo 

(2)| is implied by the Chebyshev inequality 



E I [n n (dx) + 7t n (dx)\ < iE(l + ^(X ))|X n |, 

and the assumptions (|A1|) and (|A2() . 
|(3)| follows from 

E / (1 + kHMdx) + 7f n (dx)] = EJ {C(1+ | Xn ,p )>/} (l + ^(x )) (1 + |x n n 

and the assumptions (|Alj) and (|A2|) . ■ 

Acknowledgement. We are grateful to Valentine Genon-Catalot for bringing ^3] to 
our attention. 
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